19 research outputs found

    Learning Generative Models across Incomparable Spaces

    Full text link
    Generative Adversarial Networks have shown remarkable success in learning a distribution that faithfully recovers a reference distribution in its entirety. However, in some cases, we may want to only learn some aspects (e.g., cluster or manifold structure), while modifying others (e.g., style, orientation or dimension). In this work, we propose an approach to learn generative models across such incomparable spaces, and demonstrate how to steer the learned distribution towards target properties. A key component of our model is the Gromov-Wasserstein distance, a notion of discrepancy that compares distributions relationally rather than absolutely. While this framework subsumes current generative models in identically reproducing distributions, its inherent flexibility allows application to tasks in manifold learning, relational learning and cross-domain learning.Comment: International Conference on Machine Learning (ICML

    Learning Graph Models for Retrosynthesis Prediction

    Full text link
    Retrosynthesis prediction is a fundamental problem in organic synthesis, where the task is to identify precursor molecules that can be used to synthesize a target molecule. A key consideration in building neural models for this task is aligning model design with strategies adopted by chemists. Building on this viewpoint, this paper introduces a graph-based approach that capitalizes on the idea that the graph topology of precursor molecules is largely unaltered during a chemical reaction. The model first predicts the set of graph edits transforming the target into incomplete molecules called synthons. Next, the model learns to expand synthons into complete molecules by attaching relevant leaving groups. This decomposition simplifies the architecture, making its predictions more interpretable, and also amenable to manual correction. Our model achieves a top-1 accuracy of 53.7%53.7\%, outperforming previous template-free and semi-template-based methods

    Neural Unbalanced Optimal Transport via Cycle-Consistent Semi-Couplings

    Full text link
    Comparing unpaired samples of a distribution or population taken at different points in time is a fundamental task in many application domains where measuring populations is destructive and cannot be done repeatedly on the same sample, such as in single-cell biology. Optimal transport (OT) can solve this challenge by learning an optimal coupling of samples across distributions from unpaired data. However, the usual formulation of OT assumes conservation of mass, which is violated in unbalanced scenarios in which the population size changes (e.g., cell proliferation or death) between measurements. In this work, we introduce NubOT, a neural unbalanced OT formulation that relies on the formalism of semi-couplings to account for creation and destruction of mass. To estimate such semi-couplings and generalize out-of-sample, we derive an efficient parameterization based on neural optimal transport maps and propose a novel algorithmic scheme through a cycle-consistent training procedure. We apply our method to the challenging task of forecasting heterogeneous responses of multiple cancer cell lines to various drugs, where we observe that by accurately modeling cell proliferation and death, our method yields notable improvements over previous neural optimal transport methods

    BBF RFC 105: The Intein standard - a universal way to modify proteins after translation

    Get PDF
    This Request for Comments (RFC) proposes a new standard that allows for easy and flexible cloning of intein constructs and thus makes this technology accessible to the synthetic biology community

    Neural Optimal Transport for Dynamical Systems: Methods and Applications in Biomedicine

    No full text
    Modeling dynamical systems is a core subject of many scientific disciplines as it allows us to predict future states, understand complex interactions over time, and enable informed decision-making. Biological systems in particular are governed by dynamical processes, with their inherently complex and constantly changing patterns of interactions and behaviors. Single-cell biology has revolutionized biomedical research, as it allows us to monitor such systems at unprecedented scales. At the same time, it presents us with formidable challenges: While single-cell high-throughput methods routinely produce millions of data points, they are destructive assays, such that the same cell cannot be observed twice nor profiled over time. Since many of the most pressing questions in the field involve modeling the dynamic responses of heterogeneous cell populations to various stimuli, such as therapeutic drugs or developmental signals, there is a pressing need to provide computational methods that can circumvent that limitation and re-align these unpaired measurements. Optimal transport (OT) has emerged as a major opportunity to fill in that gap in silico as it allows us to reconstruct how a distribution evolves, given only access to distinct snapshots of unaligned data points. Classical OT methods, however, do not generalize to unseen samples. Yet, this is crucial when, for example, predicting treatment responses of incoming patient samples or extrapolating cellular dynamics beyond the measured horizon. By harnessing the theoretical constructs of OT, this thesis explores and develops neural static and dynamic optimal transport schemes for elucidating the intricate dynamics of biological populations. It encapsulates an array of algorithmic frameworks, with contributions to both the understanding and prediction of population dynamics: First, we derive static neural optimal transport schemes capable of learning a map between the unpaired distributions of unperturbed and perturbed cells. These models excel at predicting single-cell responses to varying perturbations, such as cancer drug screens, and generalize the inference of treatment outcomes to unobserved cell types and patients. This has significant implications for personalized medicine, as it allows for the prediction of treatment responses for new patients in large-scale clinical studies. Second, we explore dynamic neural optimal transport formulations that leverage the connections of OT to partial differential equation and gradient flows through the Jordan-Kinderlehrer-Otto scheme, as well as stochastic differential equations and optimal control through the diffusion Schrödinger bridge. These methods then serve as robust tools for reconstructing stochastic and continuous-time dynamics from marginal observations, allowing us to dissect fine-grained and time-resolved cellular mechanisms. This thesis connects a variety of seemingly unrelated concepts into a unified framework, and the presented methodologies offer a computational and mathematical foundation for modeling of cellular dynamics. This provides new avenues to understand cellular heterogeneity, plasticity, and response landscapes. Such neural parameterizations of static and dynamic OT that allow for out-of-sample inference lay the groundwork for exciting opportunities to make novel biological discoveries, infer personalized therapies from single-cell patient samples, and push the boundaries of regenerative medicine

    Proximal Optimal Transport Modeling of Population Dynamics

    Full text link
    Consider a population of particles evolving with time, monitored through snapshots, using particles sampled within the population at successive timestamps. Given only access to these snapshots, can we reconstruct individual trajectories for these particles? This question arises in many crucial scientific challenges of our time, notably single-cell genomics. In this paper, we propose to model population dynamics as realizations of a causal Jordan-Kinderlehrer-Otto (JKO) flow of measures: The JKO scheme posits that the new configuration taken by a population at time t+1 is one that trades off a better configuration for the population, in the sense that it decreases an energy, while remaining close (in Wasserstein distance) to the previous configuration observed at t. Our goal in this work is to learn such an energy given data. To that end, we propose JKOnet, a neural architecture that computes (in end-to-end differentiable fashion) the JKO flow given a parametric energy and initial configuration of points. We demonstrate the good performance and robustness of the JKOnet fitting procedure, compared to a more direct forward method

    Supervised Training of Conditional Monge Maps

    No full text
    Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures (μ,ν)(\mu,\nu), a parameterized map TθT_\theta that can efficiently map μ\mu onto ν\nu. In many applications, such as predicting cell responses to treatments, pairs of input/output data measures (μ,ν)(\mu,\nu) that define optimal transport problems do not arise in isolation but are associated with a context cc, as for instance a treatment when comparing populations of untreated and treated cells. To account for that context in OT estimation, we introduce CondOT, a multi-task approach to estimate a family of OT maps conditioned on a context variable, using several pairs of measures (μi,νi)(\mu_i, \nu_i) tagged with a context label cic_i. CondOT learns a global map Tθ\mathcal{T}_{\theta} conditioned on context that is not only expected to fit all labeled pairs in the dataset {(ci,(μi,νi))}\{(c_i, (\mu_i, \nu_i))\}, i.e., Tθ(ci)♯μi≈νi\mathcal{T}_{\theta}(c_i) \sharp\mu_i \approx \nu_i, but should also generalize to produce meaningful maps Tθ(cnew)\mathcal{T}_{\theta}(c_{\text{new}}) when conditioned on unseen contexts cnewc_{\text{new}}. Our approach harnesses and provides a novel usage for partially input convex neural networks, for which we introduce a robust and efficient initialization strategy inspired by Gaussian approximations. We demonstrate the ability of CondOT to infer the effect of an arbitrary combination of genetic or therapeutic perturbations on single cells, using only observations of the effects of said perturbations separately

    Multi-Scale Representation Learning on Proteins

    No full text
    Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein –HoloProt– connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure –comprising secondary and tertiary components– capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification).On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss

    Multi-Scale Representation Learning on Proteins

    No full text
    Proteins are fundamental biological entities mediating key roles in cellular function and disease. This paper introduces a multi-scale graph construction of a protein –HoloProt– connecting surface to structure and sequence. The surface captures coarser details of the protein, while sequence as primary component and structure –comprising secondary and tertiary components– capture finer details. Our graph encoder then learns a multi-scale representation by allowing each level to integrate the encoding from level(s) below with the graph at that level. We test the learned representation on different tasks, (i.) ligand binding affinity (regression), and (ii.) protein function prediction (classification).On the regression task, contrary to previous methods, our model performs consistently and reliably across different dataset splits, outperforming all baselines on most splits. On the classification task, it achieves a performance close to the top-performing model while using 10x fewer parameters. To improve the memory efficiency of our construction, we segment the multiplex protein surface manifold into molecular superpixels and substitute the surface with these superpixels at little to no performance loss

    Recovering Stochastic Dynamics via Gaussian Schr\"odinger Bridges

    Full text link
    We propose a new framework to reconstruct a stochastic process {Pt:t∈[0,T]}\left\{\mathbb{P}_{t}: t \in[0, T]\right\} using only samples from its marginal distributions, observed at start and end times 00 and TT. This reconstruction is useful to infer population dynamics, a crucial challenge, e.g., when modeling the time-evolution of cell populations from single-cell sequencing data. Our general framework encompasses the more specific Schr\"odinger bridge (SB) problem, where Pt\mathbb{P}_{t} represents the evolution of a thermodynamic system at almost equilibrium. Estimating such bridges is notoriously difficult, motivating our proposal for a novel adaptive scheme called the GSBflow. Our goal is to rely on Gaussian approximations of the data to provide the reference stochastic process needed to estimate SB. To that end, we solve the \acs{SB} problem with Gaussian marginals, for which we provide, as a central contribution, a closed-form solution and SDE-representation. We use these formulas to define the reference process used to estimate more complex SBs, and show that this does indeed help with its numerical solution. We obtain notable improvements when reconstructing both synthetic processes and single-cell genomics experiments
    corecore